Likelihood / Regression

The others side of the coin

Statistics: Interested in estimating population-level characteristics; i.e., the parameters

\[ \begin{align*} y \rightarrow& f(y|\boldsymbol{\theta}) \\ \end{align*} \]

The Likelihood Function

What we can say about our parameters using this function?

\[ \begin{align*} \mathcal{L}(\boldsymbol{\theta}|y) = P(y|\boldsymbol{\theta}) = f(y|\boldsymbol{\theta}) \end{align*} \]

The likelihood (\(\mathcal{L}\)) of the unknown parameters, given our data, can be calculated using our probability function.

CODE:

# A data point
  y=c(10)

#the likelihood the mean is 8, given our data
  dnorm(y,mean=8)
[1] 0.05399097


If we knew the mean is truly 8, it would also be the probability density of the observation y = 10.

Many Parameter Guesses

# Let's take many guesses of the mean
  means=seq(0,20,by=0.1)

# Use dnorm to get likelihood of each guess of the mean
# Assumes sd = 1
  likelihood=dnorm(y, mean=means)

Statistics and PDF Example

What is the mean height of King Penguins?

Statistics and PDF Example

We go and collect data,

\(\boldsymbol{y} = \begin{matrix} [4.34 & 3.53 & 3.75] \end{matrix}\)


Let’s decide to use the Normal Distribution as our PDF.

\[ \begin{align*} f(y_1 = 4.34|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{1}-\mu}{\sigma})^2} \\ \end{align*} \]

AND

\[ \begin{align*} f(y_2 = 3.53|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{2}-\mu}{\sigma})^2} \\ \end{align*} \] . . .

AND

\[ \begin{align*} f(y_3 = 3.75|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{3}-\mu}{\sigma})^2} \\ \end{align*} \]

Or simply,

\[ \textbf{y} \stackrel{iid}{\sim} \text{Normal}(\mu, \sigma) \] . . .

\(iid\) = independent and identically distributed

Continued

The joint probability of our data with shared parameters \(\mu\) and \(\sigma\),

\[ \begin{align*} & P(Y_{1} = y_1,Y_{2} = y_2, Y_{3} = y_3 | \mu, \sigma) \\ &= \mathcal{L}(\mu, \sigma|\textbf{y}) \end{align*} \]

IF each \(y_{i}\) is independent, the joint probability of our data are simply the multiplication of all three probability densities,

\[ \begin{align*} =& f(y_{1}|\mu, \sigma)\times f(y_{2}|\mu, \sigma)\times f(y_{3}|\mu, \sigma) \end{align*} \]

We can do this because we are assuming knowing one value (\(y_1\)) does not tell us any new information about another value \(y_2\).

\[ \begin{align*} =& \prod_{i=1}^{3} f(y_{i}|\mu, \sigma) \\ =& \mathcal{L}(\mu, \sigma|y_{1},y_{2},y_{3}) \end{align*} \]

Code

Translate the math to code…

# penguin height data
  y=c(4.34, 3.53, 3.75)

#Joint likelihood of mu=3, sigma =1, given our data
  prod(dnorm(y,mean=3,sd=1))
[1] 0.01696987


Calcualte likelihood of many guesses of \(\mu\) and \(\sigma\) simultaneously,

# The Guesses
  mu=seq(0,6,0.05)
  sigma=seq(0.01,2,0.05)
  try=expand.grid(mu,sigma)
  colnames(try)=c("mu","sigma")

# function
fun=function(a,b){
  prod(dnorm(y,mean=a,sd=b))
  }

# mapply the function with the inputs
  likelihood=mapply(a=try$mu,b=try$sigma, FUN=fun)

# maximum likelihood of parameters
  try[which.max(likelihood),]
      mu sigma
925 3.85  0.36


Likelihood plot (3D)

Sample Size

What happens to the likelihood if we increase the sample size to N=100?

Proper Guessing

Let’s let the computer do some smarter guessing, i.e., optimization.

#Note: optim function uses minimization, not maximization. 
#THUS, need to put negative in our function

#Note: log=TRUE, allows us to add rather than multiply 
#      (sum, instead of prod)

neg.log.likelihood=function(par){
  -sum(dnorm(y,mean=par[1],sd=par[2],log=TRUE))
  }

#find the values that minimizes the function
#c(1,1) are the initial values for mu and sigma
fit <- optim(par=c(1,1), fn=neg.log.likelihood,
             method="L-BFGS-B",
             lower=c(0,0),upper=c(10,1))

#Maximum likihood estimates for mu and sigma
fit$par
[1] 3.901219 0.927352

The Linear Regression way

King Penguin Height Data (N=100)

out=lm(y~1)
summary(out)

Call:
lm(formula = y ~ 1)

Residuals:
     Min       1Q   Median       3Q      Max 
-3.00136 -0.61581  0.01208  0.67407  2.58369 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.9012     0.0932   41.86   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.932 on 99 degrees of freedom

Moments

Book Chapter (Section 3.4.4.)

Probability Functions have qualities called ‘Moments’

  • Moment 1: Expected Value (mean)

  • Moment 2: Variance

  • Moment 3: Skewness

  • Moment 4: Kurtosis

  • Moment k: …

Parameters are not always moments.